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Summary. Sequence homology in pre-divergence tRNA species revealed cofactor/adaptors cognate for 16 
amino acids derived from oxaloacetate, pyruvate, phosphoglycerate, or phosphoenolpyruvate were related. 
Synthesis path-distances of these amino acids correlated with phylogenetic depth, reflecting relative residue 
frequency in pre-divergence sequences. Both metrics were thus aligned in the four sub-families of the 
Aspartate family, and disaligned in the small Glutamate family; a functional difference was noted and seen 
to parallel synthetase duality. Amino acid synthetic order, based on path-distances, indicate NH 4 + fixer 
amino acids, Asp 1 , Asn 2 , and homologues, Glu 1 , Gin 2 , formed the first code. Together with a termination 
signal, they acquired all four triplet 4-sets in the XAN column (X, 5’ coding site; N, any 3’-base). An invariant 
mid-A conformed with pre-code translation on a poly(A) template by a ratchet-equipped ribosome resulting 
in random, polyanionic polypeptides. Code expansion occurred in a compact (mutation minimizing) column¬ 
wise pattern, (XAN)-> XCN-> XGN-> XUN; with increasing mean path-distance, (1.5)-> 4-> 5-> 7 
steps; amino acid side-chain hydrophobicity, (+6.6)-> -0.8-> -1.5-> -3.2 kcal/ mol; codon:anticodon H 
bond enthalpy (selection for bond-strength), (-12.5)-> -17.5-> -15.5-> -14.5 kcal/ mol; and precursor- 
specific 5’-base, A, oxaloacetate, G, pyruvate/oxaloacetate, U, phosphoglycerate/ oxaloacetate, C, 
oxoglutarate, forming horizontal code domains. Codon bias evidence corroborated the XCN-> XGN step in 
expansion, and revealed row GNN coevolved with ANN, on correction for overprinting. Extended surface- 
attachment (Fajan-Paneth principle) by pro-Fd[5] and bilayer partitioning by H + ATPase proteolipid-h 1 
subunit implicated expansion phase proteins in driving increases in side-chain hydrophobicity during code 
expansion. 3’-Base recruitment in pre-assigned codon boxes added six long (9- to 14-step) path amino acid, 
bearing a basic or cyclic side-chain; 3 of 4 polar, post-expansion amino acids acquired polar cluster NAN 
codons and 2 of 3 non-polar (lie 7 included) acquired non-polar cluster NUN codons, yieldng a split-box pair 
homology of p = 5.4x1 O' 3 . All eight overprinted codon boxes (GA Y R for Asp 1 , Glu 1 included) exhibit weak 
codon :anticodon FI-bond enthalpy, -14 kcal/mol or higher, in three of six distinct code enthalpy states. 
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1. Introduction 


Path-distances in the pre-LUCA reaction sequences that led to the standard set of amino acids in proteins 
(Crick, 1958) place both diacid amino acids, Asp 1 and Glu 1 , and their amides, Asn 2 and Gin 2 , at the origin of 
the genetic code (Davis, 1999). 1 Together with a peptide chain-termination signal, they form a compact, 
error-minimizing set of four homologues, synthesized on paths of minimal length, and encoded by XAN 
triplets. The path-distance metric of amino acid synthetic order thus strategically positions the first code, in 
template-directed translation, at the point of entry of N atoms, contributed by ammonium ions, into the 
ancient pathways of amino acid synthesis. These paths originated at oxaloacetate, or 2-oxoglutarate, within 
the reductive, autocatalytic citrate cycle (Wachtershauser, 1992). Hetero-polypeptides with carboxyl- and 
amide-bearing residues resulted, to form a class of polyanionic, non-replicating biopolymers that broadly 
incorporated features of the phosphate-based, aperiodic replicators (Westheimer, 1987), consistent with a 
shared chemistry having shaped the formation of both. 

An increase in Asp 1 and Asn 2 frequency with phylogenetic depth, on back-tracking to early protein residue 
sequences, led Brooks et al. (2002) to identify both amino acids as early additions to the code, consistent 
with their synthesis by 1- to 2-step reaction sequences and inclusion in the first code. Applying changes 
found to accompany phylogenetic depth to code evolution, within the pre-LUCA era, expanded the scope of 
standard protein phylogenetics, centered on analysis of post-divergence mutation-distances between 
residue sequences. In contrast to the foregoing finding, however, homologues, Glu 1 and Gin 2 , decreased in 
frequency with phylogenetic depth, suggesting they were late additions to the code. A search for the source 
of this difference in early levels of incorporation has revealed that Aspartate and Glutamate family amino 
acids differed in their role in the formation of the code. 

Reconciling these path-distance and phylogenetic-depth findings is addressed here with the broad 
objective of aligning them with the metrics of other features, to allow a comprehensive reconstruction of 
code evolution within the pre-LUCA era. They include the dependence of amino acid assignments on codon 
bonding strength (Lagerkvist, 1978; Grosjean and Westhof, 2016), formation of conspicuous ‘vertical’, mid- 

superscripts give amino acid synthesis path-length; one and three-letter amino acid, and one-letter nucleotide 
abbreviations are used (see Supplement). X, marks the triplet coding site. LUCA, last universal common ancestor. 



base clusters of polar and non-polar amino acids (Woese, 1965; Sonneborn, 1965), codon bias in ancestral 
sequences (Shepard, 1981; Diaz-Lazcoz et al., 1995; Brooks and Fresco, 2003), homology between 
cofactor/adaptor tRNA species, with post-LUCA variations removed (Davis, 2008a), and minimization of 
translation errors. A reconstruction has resulted that depicts evolution of the code as the product of an 
interplay between these factors. 

2. Imprint of amino acid synthesis path-distance on the code 

Pre-LUCA, tRNA-dependent synthesis pathways (Fig. 1) extending 1- to 2-, 2- to 8-, and 7- to 14-steps, 
from a precursor in the reductive citrate cycle (RCC), formed amino acids that successively recruited the 5’-, 
mid-, and 3’-base of encoding RNA triplets. This implies code capacity was 4, 16, and 64 at each stage, 
leaving, in principle, 60, 48, and 0 triplets unassigned; where unassigned, untranslatable triplets 
permanently stall translation (Bretscher et al., 1965), in a ‘lethal’ form of nonsense mutation. As the 
ribosome proofing center allows codons with a 3’U to pair with A, G, C, or U at tRNA N34 (‘wobble’ site), the 
initial number of untranslatable mutations reduces to 12 mid-base substitutions (XAN: A-> C, G, U; X, 5’- 
coding base; A, mid-site adenosine; N, 3’-any base), making formation of the proposed small, initial code 
far less hazardous. 

Different kinds of amino acids entered the code at each of these three stages: code origin, expansion, and 
overprinting (Davis, 2006). Diacid amino acids, Asp 1 and Glu 1 , and their amides, Asn 2 and Gin 2 , form in NH 4 + 
trapping reaction sequences that branch 1 and 2 steps from oxaloacetate, or 2-oxoglutarate, respectively. 
The four NH 4 + fixer amino acids and chain terminator (Ter) acquired triplets in all 4-sets of the XAN 
column. An invariant mid-A is seen as a vestige of pre-code translation on a poly(A) template. With the 
participation of an ancestral, type ID tRNA adaptor, charged by either diacid amino acid, and ratchet- 
equipped ribosome, polyanionic, random sequence poly(Asp‘,Glu") polypeptides result (Davis, 1999, 
2008, 2019). 

Expansion from the NH 4 + Fixers Code accompanied recruitment of codon mid-bases: C, G, U. 

Addition of ten increasingly hydrophobic amino acids, bearing hydroxyl, sulfhydryl, or alkyl side-chains 
resulted. A columnwise, XCN -> XGN -» XUN, pattern of codon recruitment added ten amino acids 
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Fig. 1. Contour map depicting growth of tRNA-dependent, pre-LUCA amino acid synthesis pathways 
in relation to code evolution. NIV-trapping reactions that branch 1 to 2 steps from 2-oxaloacetate 
(OA) and oxoglutarate (OG), in the RCC, form Asp 1 and Glu 1 , and their amides, Asn 2 and Gin 2 . Other 
amino acid precursors were from the central trunk (CT) and reductive pentose-phosphate cycle 
(RPC); FRC refers to the pre-metabolism formose reaction cycle linked to the origin of archaic 
replicators. NH 4 + fixer amino acids, and Ter signal, acquired codons in the four XAN 4-sets (X, 5’- 
coding site; N, 3’-degenerate site). Expansion added 10 simple amino acids, formed on 2- to 8-step 
paths, with increasingly hydrophobic side-chains - hydropathy free energy values are in lower table. 
Codons were recruited in a compact, mid-C G -> U order, consistent with selection for codon 
bond strength and error minimization. 3’-Base recruitment added mainly large basic and 
aromatic amino acids with 9- to 14-step paths. Weak-bonded, pre-assigned boxes were split to 
form ‘short-path/long-path’ amino acid pairs, and enhanced NAN and NUN hydropathy clusters. 
Most amino acids occur in the Asp family (gold). Pre-LUCA protein ‘code age’ estimates (below 
table) are given for pro-ferredoxin, H + -ATPase proteolipid helix-1, cell division protein, FtsZ, and 
RNA polymerase (a-subunit N-terminal). Pro-Fd and PL-hl ‘code age’ identify surface-contact 
(Fajan-Paneth effect) and membrane formation as ‘hydrophobic attractors’ in code expansion 
(Davis, 2002). Reaction sequences (see Supplement, Fig. SI) were drawn from Michal (1999). 

with mean synthesis path lengths of 4, 5, and 7 steps, respectively (Fig. 1). Mid-base recruitment, 
based on amino acid synthesis path-distances, conforms with selection targeting codon bonding 
strength (Grosjean and Westhof, 2016) during code expansion. 

3’-Codon base recruitment entailed overprinting pre-assigned code boxes. Six long-path amino acids, 
with basic or aromatic side-chains, were encoded: Arg 9 , Lys 10 , Phe 11 , Tyr 11 , His 13 , and Trp 14 , together with, 
lie 7 . Amino acids Lys 10 , His 13 , Trp 14 , and lie 7 acquired a codon pair from same family members Asn 2 , Gin 2 , 
Cys s , and Met 7 , respectively. GA Y R encodes Asp 1 and Glu 1 and as both form on 1-step paths, they depart 
from the long-path (late-comer)/short-path (early-comer) pattern of code split-boxes, and this is attributed to 
their joint participation in pre-code translation (Davis, 2019). 

Conserved residue profile of pro-ferredoxin (Pro-Fd), H + -ATPase helix-1, FtsZ (filamenting temperature- 
sensitive mutant Z protein), and RNA polymerase (DNA dependent) a-subunit-N-terminus (RNAP-a(N)) 
were matched with stages identified in code formation, to establish their ‘code age’ within the pre-LUCA 
era. Pro-ferredoxin is a 23-residue protein (Davis, 2002), reconstructed by Norgaard (2009) and Norgaard 



et al. (2009), with a stage 5.7 residue profile; corresponding to a mid-expansion phase protein. Protein 
catalyzed RNA synthesis and fluid-mosaic cell membrane proton pumps were seen to be late 
developments in code expansion (Davis, 2002). 

3. Code expansion toward a hydrophobic attractor 

Amino acids entering the code at its origin and expansion formed on 1 to 8-step paths that exhibit a length- 
dependent distribution between code columns: paths of 1.5 ± 0.4, 4.3 ± 0.5, and 6.8 ± 0.6 (m ± s.e.m.) 


NAN NCN NUN 



Fig. 2. Transition to non-polar amino acids with growth of synthesis pathways during code 
expansion. Transfer free energy, AGj (aqueous to non-polar solvent - dielectric constant 2) values, 
from Tolstrup et al. (1994), exhibit a linear decrease, -1.5 kcal/mol per step, with increases in 
synthesis path-distance, L. The 95 per cent confidence interval on the regression line is shown. 







steps occur, respectively, in columns NAN, NCN/NGN, and NUN. These amino acids additionally exhibit a 
marked path-distance dependent difference in hydropathy. A decrease of -1.5 kcal/mol per step occurs in 
amino acid side-chain transfer free energy (aqueous to non-polar solvent) with path extension, in the 
transition, NAN -> NCN/GN -» NUN (Fig. 2). 

From an initial cluster of short-path, hydrophilic amino acids, with NAN set codons, this transition 
proceeds to a cluster of comparatively long path, hydrophobic amino acids, encoded by NUN set codons. 
Significantly, it passes through NCN/ NGN encoded amino acids, with transitional hydrophobicity 
(Supplement, Fig S3), synthesized on paths of intermediate length (Fig. 1). The progressive nature of this 
transition conforms with the proposed columnwise pattern of expansion (through triplet mid-base 
recruitment) from a small, first code at NAN. This validates the reliance on amino acid synthetic order, 
based on path distance, as a measure of advancement in code evolution (Davis, 1999, 2002, 2011). 

It is apparent, from Fig. 2, that the hydropathy clusters in the code, first noted by Woese (1965), are a 
product, at least in part, of code dynamics during expansion from an initial code with a small set of polar 
amino acids, encoded by the NAN triplets. The FT ATPase, trans-membrane subunit, proteolipid helix-1 
(Davis, 2002), has a pre-LUCA residue profile matching the amino acid array in a stage-7.1 code (Fig. 1). 
Synthesis of proteins equipped to partition with a phospholipid bilayer, enabling formation of a fluid-mosaic 
cell membrane, supplies a hydrophobic attractor for code expansion. An earlier, mid-expansion phase 
hydrophobic attractor is provided by a low potential ferredoxin antecedent, pro-Fd-[5] (Davis, 2002, 

Norgaard 2009, Ncrgaard et al. 2009). Pro-Fd-5 combines a 6-residue (carboxyl-bearing) ‘foot’, <AG t > = 4.9 
± 2.39 (m ± s.e.m.) kcal/mol per residue, with a 17-residue, [4Fe-4S] cofactor binding segment, <AG T > = 
-0.82 ± 0.71 kcal/mol per residue (Supplement, Fig. S2a). Plaving all anionic residues confined to a ‘foot’, 
attached to a comparatively long, hydrophobic, cofactor-binding segment, should serve to lengthen 
attachment to a cationic mineral surface (Fajan-Paneth effect). Pro-Fd[5] thus provides a proto-type for early 
proteins that functioned as a ‘cofactor anchor’, in a scenario that envisions the genetic code arose within a 
primordial metabolism, at the surface of a cationic mineral, as envisioned by Wachtershauser (1992). 



4. Overprinting of split-boxes by homologous amino acids 

Codon G:C and A:U bp frequency at the 5’- and mid-site have long been known to be determinants in the 
split-box distribution (Lagerkvist, 1978). Figure 3a compares amino acid synthesis path-distances in intact 
and split- boxes and shows that intact boxes contain short-path amino acids, while, by contrast, split-boxes 
pair short (early) and long (late) path amino acids. An analysis of variance established that late split-box 
paths (9.5 ± 1.4 steps) significantly exceed intact, 5.4 ± 0.8 steps, and early split-box paths, 3.6 ± 1.0 
steps. This establishes that split-boxes conserve an imprint of when over-printing took place during 
code evolution. With a short-path amino acid initially assigned an intact codon box, during code origin or 
expansion, the split-box distribution indicates that, when all 16 code boxes had been allocated, increasing 
the number of amino acids incorporated into proteins required pre-assigned codons, generally a 3’-purine or 
3’-pyrimidine doublet, be reassigned. An incoming long-path (7 to 14 steps) amino acid acquired the codon 
pair, splitting the box. 
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Fig. 3. Contribution of path-length dependent over-printing and homology optimization to codon 
split- box formation, (a) Synthesis path-lengths of intact and split-box amino acids. Early- and Late- 
set amino acids refer, respectively, the short- and long-path member of each split-box pair. An 
analysis of variance established path-lengths differed significantly between sets; p = 3.9 x 10" 3 , with 
means of 5.4 + 0.8, 3.6 +1.0, and 9.5 + 1.4 (m + s.e.m.) corresponding to Intact, Early, and Late sets. 
Late-set paths exceeded both Intact (p < 5 x 10' 2 ) and Early paths (< 1 x 10' 3 ), in a Tukey’s test; with 
no significant difference between Intact and Early paths, (b) Shows assignment of homologous 
amino acids to pre-formed hydropathy clusters, at the codon box-splitting stage (3’-codon base 
recruitment). An NAN encoded cluster of polar amino acids acquired 3 of 4 polar amino acids, while 
the NUN cluster added 2 of 3 non-polar amino acids, yielding a probability, p = 5.4 x 10' 3 of code 
formation. Single-letter designations include, X, for Ter. 


Split-boxes are located within the homology clusters, at NAN (hydrophilic) and NUN (hydrophobic) code 
columns, to a statistically significant degree (Fig. 3b). All split-box codon sets (AA Y R , GA Y R , CA Y R , UA Y R ) 
(AU ya g , UU y r ), within the hydropathy clusters were distributed to homologous amino acids, (Lys 10 /Asn 2 , 
AspVGIu 1 , Tyr'VTer, Gln 2 /His 13 ) (lle 7 /Met 7 , Phe'VLeu 8 ), discounting (UA Y R , Tyr'VTer). From this perspective, 
the code is optimal, with a joint distribution probability of 5.4 x 10" 3 . 

Acquisition of AGR and UGG, by Arg 9 and Trp 14 , respectively, resulted in non-cluster split-boxes. Split 
boxes also occur within code domains - containing amino acids with the same precursor, related tRNA 
species, and contiguous codons (Davis, 2008). A factor directly linked to Lagerkvist’s rule was plainly at play 
in forming the split-box distribution. When reassignment of a codon doublet was favorable, at an advanced 
stage in code formation, amino acid homology was optimized, consistent with minimizing the deleterious 
effects of amino acid substitution, as Sonneborn (1965) envisioned. 


5. Code enthalpy states and the split-box distribution 

Codon bonding strength played a key role in code formation, as illustrated by its split-box distribution 
(Lagerkvist, 1978). Free energy determinations from the thermal dissociation of paired nucleotide 
dimers (Chen et al., 2012) were seen by Grosjean and Westhof (2016) to furnish a physical parameter 
linked to translation fidelity and, plausibly, to the order of codon recruitment during code evolution. Figure 4a 
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Fig. 4. Codon H bond strength in translation, (a) Bond formation enthalpy, from thermal stability of 
complementary nucleotide dimers (Chen et al., 2012; expanded set), is shown to decrease linearly 
with the number of N-H #, 0 bonds, n: AH = -3.09 - 2.73n, depicted within a 95 per cent confidence 
interval, (b) Estimated N-H**0, N-H #, N, and C-H**0 bond enthalpy (Table 1), in aqueous phase, 
reveal dimers with weak, A:U, mixed A:U, G:C, or strong, G:C, bp have aggregate N-H #, 0 and N-H ,# N 
enthalpy of -8.2, -11.6, and -14.4 kcal/mol, respectively. These values are within the observed 





















standard error for each category mean: weak, -8.2 ± 0.60; mixed, -11.8 ± 0.53; and strong, -13.7 
±1.51 kcal/mol (refer to Supplement, Table SI). In a conspicuous departure from observed 
enthalpies, inclusion of C-H**0 enthalpy produced values of -11.0 , -12.6, and -14.4 kcal/mol per 
dimer with weak, mixed, and strong bonds, respectively, (c) Depicts codon 5’- and mid-base 
H bonds within the A-type double-helix formed with anticodon bases B36 and B35. tRNA loop 
nucleotide, U33, additionally forms an N-H**0 bond with anticodon purine, R35, or C-H**0 bond with 
pyrimidine, Y35. Arrows indicate H bond direction and strength: “► - 2.73, —► -1.72, —► -1.03 
kcal/mol. 


depicts the distribution of three possible H bonds, in weak, mixed, and strong bp dimers. Enthalpy values, 
from the expanded set of Chen and coworkers (see Supplement, Table S2) decrease directly with the 
number of dimer N-H-0 bonds, at a rate of -2.73 kcal/mol per bond. Based on mid-range values in H bond 
interaction energy distributions obtained by Wendler et al. (2010), the comparative N-H-N and 
and C-H-0 bond enthalpy, in an isobaric, aqueous medium, is found to be -1.71 and -0.96 kcal/mol, 
respectively (Table 1). 


Table 1. Estimated H bond enthalpy 


H Bond 

Interaction 

Energy 1 

(kcal/mol) 

Ratio 

Estimated 

Enthalpy 2 

(kcal/mol) 

N-H--0 

-2.48 + 0.90 

1.0 

-2.73 

N-H--N 

-1.55 + 0.33 

0.63 

-1.71 

C-H-O 

-0.87 + 0.38 

0.35 

-0.96 


1 Mid-range H bond interaction energy values, 
within half-range limits, from Wendler et al. (2010). 

2 N-H-N and C-H--0 enthalpies estimated from 
energy ratios applied to N-H-0 enthlapy, using 
the dependence of nucleotide dimer-pair 
formation enthalpy on N-H--0 depicted in Fig. 4(a). 





Figure 4b shows the enthalpy for weak, mixed, and strong dimers, obtained by combining N bearing H 
bond estimates, from Table 1. They closely agree with those reported by Chen and coworkers. Inclusion of 
C- H**0 enthalpy results in a conspicuous departure from observed dimer bp bonding values, notably in 
weak dimers. Thermodynamic evidence thus excludes a third H bond at [2C-H**0=C2], on the A:U Watson- 
Crick face (Fig. 4c): where double-helix bp distances are 2.8 A for N-Fl**0, 3.0 A for N-FI**N (Klug, 2004), 
and 3.2 A at [C-H“0]. As Fig. 4c indicates, a 2’0-FI**C5 bond forms between an invariant tRNA anticodon 
loop U33 and anticodon Y35. Alternatively, a strong 2’0-FI**N7 bond is formed with a purine at N35 
(Grosjean and Westhof, 2016). 

Lagerkvist’s split-box ‘two out of three’ rule can be recast in terms of codon box enthalpy states. 

Symmetry between the 5’-site of strong, G/C v. C/G, and weak, A/U v. U/A, bonders yields equal enthalpy, 
reducing the number of distinguishable codon box states by half (Fig. 5a). A second symmetry arises with 
compensating 5’- and mid-site differences in mixed-strength codon boxes, as [G/C, A] = [A/U, G], and [G/C, 
U] = [A/U, C], decreasing the number of distinct code box states to six: 

(1) 5S + 2M = -19 kcal/mol [G/C],C; Ala 2 , Pro 4 

(2) 4S + 2M + W = -17 [G/A],G; Gly 5 , Arg 9 

(3) 4S + 2M = -16 [G/C],U; Val 5 , Leu 8 

[A/U],C; Ser 4 ,Thr 6 

(4) 3S + 2M + W = -14 [G/C],A; AspVGIu 1 

Gln 2 /His 13 
[A/U],G; Ser 4 , Arg 9 

Ter/Cys7Trp 14 

(5) 3S + 2M = -13 [A/U],U; Met 7 /lle 7 , Le^/Phe 11 

(6) 2S + 2M + W = -11 [A/U],A; Ter/Tyr 11 

Asn 2 /Lys 10 

where S = -3, M = -2, W = -1 kcal/mol; whole number equivalents of FI bond enthalpy in Table 1. 



States with codon:anticodon proofing center bonding enthalpy higher than -16 kcal/mol are split in 
the standard code. Split boxes pair amino acids with synthesis paths of mixed length (Fig. 5b), as noted 


(a) Code enthalpy states and split-box distribution 
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Fig.5. Codon split-box distribution in relation to proofing center enthalpy states and amino acid 
synthesis path-length, (a) G O C and A<-> U equivalence reduces the number of distinct, 5’-site 
H bond enthalpy states to two. tRNA anticodon loop U33 introduces a mid-site asymmetry, resulting 
from a strong 2’0-H..N7 bond with anticodon purine, R35, and weak 2’0-H..C5 bond with pyrimidine, 
Y35. Four distinguishable mid-site states result. The eight doublet states obtained are defined by six 
equations, where variables S, M, and W denote strong, moderate, and weak H bonds; respectively, 
N-H..O, NH..N, and C-H-O. With rounded values of -3, -2, and -1 kcal/mol (Table 1), all four weak 
codon states, bonding enthalpy exceeding -16 kcal/mol, are split (stippled boxes), (b) All intact 
boxes have short-path (up to 8 steps) amino acids (superscripts/bar give path-length), 6 of 8 split- 
boxes encode amino acids with paths of 7 to 14 steps (white bar), making box division a late event. 


in Sec. 3. Codon boxes with sub-threshold bonding strength were consequently overprinted late in code 


formation. 









































6. Compact transitional codes minimized risk of untranslatable triplets 

Figure 6 illustrates a pattern-dependent difference in the risk of mutation to an unassigned, untranslatable 
template triplet, during formation of the code. Until a triplet acquires an adaptor, it is unreadable. Mutation to 
an unassigned triplet in a transitional code can, consequently, permanently stall translation (Bretscher et al., 
1965). Minimizing the risk of this ‘lethal’ form of nonsense mutation could be expected, therefore, to have 
strongly shaped the initial pattern of codon assignments, when the number of encoded amino acids and 
their cognate cofactor/adaptor tRNA was still small. Thus, the compact ‘four amino acid’ NH 4 + Fixers Code, 
is 25 per cent less susceptible to translation-halting nonsense mutations than the semi-compact code 
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Fig. 6. Minimizing amino acid synthetic order offered a preferred track for code evolution, (a) NH 4 + 
Fixers Code with four, 1- to 2-step synthesis path amino acids and a Ter signal, saturated the NAN 
codon set. This lowered by one-quarter, from 16 to 12, the number of possible nonsense mutations, 
to an unreadable, translation-halting triplet, in comparison with encoding the four amino acids that 
optimize mean codon bond enthalpy, at -18 kcal/mol, with a semi-compact distribution of 2- to 9- 
step amino acids, (b) Expansion of the NH 4 + Fixers Code to an equally compact NCN codon set with 
four (total of 8) 2- to 6-step amino acids achieves near-optimal codon bonding, at -17.5 kcal/mol, with 
a one-third, from 12 to 8, lower rate of translation-halting mutations; indicated as, —•. 


of four amino acids maximizing codon bonding strength, within the ribosome proofing center (Grosjean and 
Westhof, 2016); the mean enthalpy difference between both codes being -5.5 kcal/mol (-12.5 vs.-18 kcal/ 
mol). Adaptor:template bonding strength underlies template decoding specificity. It is understandable, 
therefore, that path-distances indicate expansion from the NH 4 + Fixers Code resulted in amino acids, with a 
mean synthesis path-length of 4 steps, saturating the NCN column triplets, enabling a near-maximal mean 
bonding enthalpy of 17.5 kcal/mol. In addition, by retaining a compact pattern of codon assignments, a one- 
third reduction in translation-halting, ‘Bretscher 1 mutations resulted. 

With mean synthesis path-lengths extending 4, 5, and 7 steps per codon set, expansion beyond the NFU + - 
Fixers Code, at NAN set, proceeded in an NCN -» NGN -»• NUN order. Mean codon bond enthalpy 
accompanying expansion was -17.5, -16.1, and -14.5 kcal/mol per set (Fig. 1). This substantiates the 
proposition that selection acted to optimize translation fidelity, through assigning codons of highest bonding 
strength first (Grosjean and Westhof, 2016). With the threat posed by unassigned triplets greatest initially, 
and receding as code expansion advanced, assignment of weak-bonding NAN codons, with mean bonding 
enthalpy of -12.5 kcal/mol (Fig. 6) is understandable, given that amino acid synthetic order, based on path- 
distances, reveals the first code arose from pre-code translation on a poly(A) template, precluding 
competition with NCN, NGN and NUN codon sets. 

Columnwise organization of the code (Fig. 1), first noted by Woese (1965), is seen here as the outcome of 
an interplay primarily between code origin, from pre-code assembly of a few amino acids on a poly(A) 
template, the risk posed by untranslatable codons, and selection for codon bonding strength. 



7. Imprint of pre-code translation on the code 

The two diacid amino acids in proteins share the GA Y R box, forming the only short-path pair to occupy a 
split-box. With a mean bond enthalpy of -14 kcal/mol (Fig. 5a), GA» box codons would split during 3’-base 
recruitment, in the final stage of code formation (Sec. 2). Each forms on a 1-step path, and as first 
generation additions to the code, Asp 1 and Glu 1 conceivably shared a cofactor/adaptor, tRNA-!D Asp ’ GIU 3 cuu, 
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Fig. 7. Pathway extensions in expansion from the NH 4 + Fixers Code. They accompanied codon 
acquisition by newly formed amino acids and diversification of LUCA tRNA species, (a) Initial 
NH 4 + fixing reactions (paths i and ii) formed diacid amino acids, D 1 , E 1 , and their amides, N 2 , Q 2 . 
Together with Ter triplets, they are shown to have recruited all 16 NAN codons. With 5’-base identity 
determining codon specificity in the NH 4 + Fixers Code and 3’-base degeneracy, mutation to any of 48 
unassigned triplets, able to halt translation, decreases from ‘3 in 4’ (random array) to ‘1 in 3’ (mid-A 
substitution). Recruiting an NCN template triplet required a mid-base, A->C (adaptor U35-> G35) 
substitution, Extension of amide synthesis pathways, iii and iv, yield 4-step paths producing a Thr 6 
a-amino acid intermediate, homo-serine (hs 4 ), and imino acid, proline (P 4 ), respectively. Subsequent 
5’-base template substitutions, A->G and A->U, in hS 4 codons, ACN, form codons GCN and UCN 
for A 2 and S 4 , synthesized on paths-v and -vi, respectively, saturating the 16 NCN set. All transitions 
conserve cofactor/adaptor tRNA core structure (in brackets); apart from D^IDJ^N^IA]. (b) Consen¬ 
sus sequences of LUCA tRNA specific for A 2 and S 4 exhibit homology of 7.4 to 11.0 quarts (p = 3.5 x 
10' 5 - 2.4 x 10' 7 ) and 5.7 to 9.2 quarts (p = 3,7x1 O' 4 - 2.9x1 O' 6 ) with tRNA for N 2 and T 6 (T 6 is current 
terminal amino acid of former hS 4 path), (c) Synthesis path-i to -iv have linear extensions. Path-v 
and -vi result in A 2 and S 4 synthesis and they include takeover of pre-code, supernumerary reaction 
sequences from RCC and CT, together with N 2 pathway extensions. Letter and background colors 
indicate synthesis pathway precursor - oa, oxaloacetate; og, oxoglutarate; branch-point - py, 
pyruvate; pg, 3-phospho-glycerate. 

cognate for GA« codons, in the manner of pre-code translation of both amino acids on a poly(A) template 
(Davis, 1999, 2019). Formation of the NH 4 + Fixers Code, with the acquisition of codons in the XAN set, and 
subsequent expansion to XCN set codons are reconstructed in Fig. 7, from amino acid path-distances and 
homology between pre-divergence tRNA base sequences (Davis, 2008). 

Initial saturation of the sixteen XAN triplets (Davis, 1999), recruited the weakest set in the code (mean 
codon bonding enthalpy, -12.5 kcal/mol; Fig. 5). Formation of the first code plainly predated competition with 
strong bonding codons and translation at optimal fidelity. With translation originating on a simple, pre-code 
poly(A) template, resulting in synthesis of polyanionic, random sequence poly(Asp, Glu) polypeptides, the 
transition to a set of codons with an invariant mid-A indicates formation of the NH 4 + Fixers Code represented 
an intermediate stage in the transition to a code; insufficiently advanced to have optimized codon bonding 
strength. Expansion from the weak NAN triplets of the NH 4 + Fixers Code, by contrast, displays selection for 



codon bonding strength, in full force. As shown in Fig. 7, an equally compact code, incorporating NCN 
triplets - the strongest bonding codon column (bond enthalpy, -16 to -19 kcal/mol; Fig. 5) resulted. 

NH 4 + Fixers Code amino acids derive from oxoglutarate (Glu 1 , Gin 2 ), or oxaloacetate (Asp 1 , Asn 2 ). 

While Asn 2 has a type-IA tRNA, Asp 1 , Glu 1 , and Gin 2 have type-ID tRNA (Saks and Sampson, 1995). 
Flence, a type ID tRNA core was the apparent ancestral form (Davis 2006, 2008). With its A-rich codons, 
the NH 4 + Fixers Code indicates pre-code translation on a poly(A) template, makes AAA the ancestral codon. 
Since AAA codes for Asn 2 , in the NH 4 + Fixer Code (assignment of AAA to Lys 10 in the standard code 
followed from a later overprinting), the distinct core structure of tRNA-IA Asn uuucan be linked to direct 
competition for the AAA codon and elimination of the ancestral, ambivalent tRNA-ID Asp ’ GIU A A A cofactor/ 
adaptor; together with replacement of codomanticodon self-recognition in favor of base pair 
complementarity (Davis, 2019). 

Unlike the linear i - iv paths (Fig. 7c) yielding Asn 2 , Gin 2 , homo-Ser 4 , and Pro 4 , respectively, paths v and vi 
show oxaloacetate as an upstream precursor in pathways leading to Ala 2 and Ser 4 synthesis (Davis 2013). 
Close homology between pre-divergence tRNA-IA Ala 3C Gu and tRNA-IA Ser 3C Gu (pre-type II form) versus tRNA- 
IA Asn uuu (Fig. 7b) suggests oxaloacetate -> pyruvate, and oxaloacetate-> phosphoglycerate reaction 
segments (with carboxyl-bearing intermediates for cofactor tRNA attachment) were deleted during the 
protein takeover of amino acid synthesis. As segments upstream from current path branch-points include 
central metabolism reactions, predating amino acid synthesis, they were supernumerary to the path- 
distance metric of code evolution; incorporation of Val 5 synthesis reactions in the lie 7 pathway (Michal, 

1999) illustrates a path takeover in code formation. Homology between pre-species-divergence tRNA 
species for Ser 4 and Asn 2 , indicate type-11 tRNA Ser , bearing an enlarged variable loop, evolved some time 
after tRNA-IA’ Cys 3AC u had formed from its precursor cofactor/adaptor, tRNA-IA Ser 3A Gu- 

Emergence of tRNA-l l Ser allowed transfer of anticodon stem identity elements to the variable loop, freeing 
the adaptor to diversify the number of Ser 4 codons, at a time when unassigned, untranslatable triplets posed 
an existential threat to protein synthesis. Codons initially recruited for Ser 4 , by variant tRNA Ser species, were 
evidently acquired by Cys 5 , Gly 5 , Leu 7 , Arg 9 , Trp 14 , and Ter. In particular, a-isopropyl-malate, a dicarboxyl 



intermediate in Leu 7 synthesis, is the putative site of a tRNA exchange, resulting in Leu 7 acquiring a type-ll 
tRNA species from Ser 4 - its sole source - cognate with UUN and CUN (Davis, 2013). 

tRNA sequence homology, preceding species divergence (Davis, 2008), reveals three of the four amino 
acids saturating the set of strong NCN codons, on expansion from the NH 4 + Fixers Code, had type-IA 
cofactor/adaptors (Fig. 1a,b). They include the 4-step a-amino acid, homoserine, hS 4 , an intermediate in 
the synthesis of Thr 6 ; end-product of the full, 6-step path and a standard set amino acid. Single N36 
anticodon U->C and U-> A substitutions in tRNA-IA hS 3UGU , respectively, recruited codons GCN and UCN 

for Ala 2 and Ser 4 . Anticodon N35 U -> G substitutions in tRNA-IA Asn uuu and tRNA-ID Gln 3GU u, respectively, 

occurred in forming tRNA-IA hS UGU cognate with codons ACN, and tRNA-ID Pro 3GGU cognate with codons CCN. 
In expansion from the NH 4 + Fixers Code, codon bonding strength increased substantially, as indicated, from 
-12.5 to -17.5 kcal/mol (Fig. 5a), while mean amino acid synthesis path-length increased from 1.5 to 3.5 
steps, and mean accumulated codon substitutions rose from 0.5 to 1.5. The IA:ID ratio in cofactor/adaptors 
with each core structure changed from 1:2 (Asp 1 and Glu 1 ambiguously charging tRNA-ID Asp,G ' u 3C uu) to 3:1. 
Since this measure broadly indicates the relative proportion of amino acids in the Asp and Glu family within 
the code, it demonstrates code expansion resulted from disproportionate growth of the Aspartate family. 

8. Correlation between amino acid path-distance and phylogenetic depth 

Determinants of the order of entry of different kinds of amino acids into the code, and their distribution within 
it, have been identified above using the synthetic order of amino acids, based on path-length, as a metric of 
the advancement of code evolution. An alternative measure of the course of code evolution has been 
introduced that relies on amino acid phylogenetic-depth (Brooks et al., 2002). It extends standard 
phylogenetic techniques of protein analysis to sort amino acids into early-, mid-, or late-additions to the 
code, based on the direction and magnitude of change in residue frequency, on back-tracking to pre¬ 
divergence protein residue sequences In principle, both metrics of code evolution would be anticipated to 
align. 


Figure 8a shows there is a linear correlation between path-length and phylogenetic-depth among standard 



Path-length 


set amino acids, on excluding the four Glutamate family members: L = 13.7 - 7.5 D, where L denotes path- 
length and D is depth. With depth specified as the ratio of probabilities, Pi_ucA(i)/Pmod(i)> for residue, i, in a 
LUCA era protein versus the modern protein, code earlycomers Asp 1 , Asn 2 , and Ala 2 display, for example, 
depth values of 1.06 to 1.43, while latecomers Phe 11 , Tyr 11 , Trp 14 depth is 0.59 to 0.35 (Supplement Table 
S2). Differences in amino acid path-length, within the three depth categories of code formation (1, early; 2, 
mid; 3, late) were statistically significant, with probability, p = 3.2 x 10" 3 (F = 8.93, d.f. = 2), in an analysis of 
variance; early and late amino acid mean path-distances differed significantly in a Tukey’s test, p < 1 x 10' 2 . 



Fig. 8. Linear regression, with 95 per cent confidence interval, showing a contrasting relationship 
between synthesis path-length and phylogenetic-depth in each amino acid family, (a) Asp family, 
n = 16, and (b) Glu family, n = 4. Regions 1, 2, and 3 refer, respectively, to higher (early-comer), 
constant (mid-stage), and lower (late-comer) depth index for indicated residues in LUCA versus 
modern proteins (Supplement, Table S2). L signifies synthesis path-length and D phylogenetic- 
depth in the regression equations. 
















Phylogenetic depth and amino acid synthetic order were, conversely, negatively correlated in the 
Glutamate family, containing Glu 1 , Gin 2 , Pro 4 , and His 13 : L = -6.3 + 11.0 D. This incongruously places both 
short-path NH 4 + fixer amino acids, half the NH 4 + Fixers Code, among late-comers (category 3) to the code, 
while Pro 4 and His 13 were, respectively, categorized as intermediate (category 2), and early (category 1) 
additions to the code. The Glutamate Family is comparatively small, only one-fifth of the standard set, and 
this will be seen to be significant, together with phylogenetic evidence, noted in the previous section, on the 
diversification of bifunctional cofactor/adaptor tRNA species in the pre-LUCA era. 

The quantitative agreement demonstrated between path-distance and phylogenetic-depth (Fig. 8a) 
validates both as metrics of code evolution. It remains to be clarified, however, why they should be aligned 
for amino acids with oxaloacetate, pyruvate, phosphoglycerate, and phosphoenolpyruvate as precursors, 
but disaligned for those derived from oxoglutarate. 

9. Antiquity of NH 4 + fixer amino acids and their tRNA 

Prokaryotes conserve tRNA-dependent pathways for the synthesis of both diacid/amide pairs of NH 4 + fixer 
amino acids (Wilcox and Nirenberg, 1968; Schon et al., 1988; Curnow et al., 1996; Gagnon et al., 1996). 
This supports them being the conserved root of a once extensive network (Cork and Purugganan, 2004) of 
tRNA-dependent amino acid synthesis pathways. Both NH 4 + fixer diacid/amide pairs thus formed before 
most standard set amino acids. 

Figure 9 depicts a tree of tRNA species based on homology between consensus base sequences, pre¬ 
dating species divergence (Davis, 2008). Its relation to code evolution is inferred from the path-distance 
synthetic order of each cognate amino acid. All four NH 4 + fixer cofactor/adaptor tRNA species cluster at the 
tree root, consistent with forming the first code. Other amino acids, with longer, later, and less-integrated 
synthesis pathways, evidently discarded their tRNA cofactor during the protein takeover, with tolerable 
disturbance to the network of synthesis pathways. 

Tree nodes join each tRNA species and its antecedent, identified from base sequence homology. Cognate 
amino acid path-length determined branch height (Fig. 9). Conspicuously more, 14 versus 2, tRNA species 



precursor 
oa □ 

og □ 
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py □ 

pg □ 
pep □ 


Fig. 9. Phylogenetic tree of LUCA tRNA species showing substantially greater diversification of Asp 
family tRNA, from a root of diacid and amide amino acid adaptor/cofactors. Nearest-neighbor tRNA 
having closest identity between consensus sequences form each node; post-LUCA variations and 
universally conserved nucleotides excluded. Cognate amino acid synthesis path-length, from the 
reductive citrate cycle, specifies the stage of code formation at each node, marking the origin of a 
tRNA species. Fourteen amino acids, encoded by 17 sets of codons, charge tRNA derived from Asn 2 
tRNA. Two amino acids, each with a single set of codons, charge tRNA derived from tRNA Gln . 
Enclosed at the tree root are tRNA for both diacid amino acids (D, E) and their amides (N, Q), 
highlighting their retention of tRNA-dependent amino acid synthesis in prokaryotes. This supports 
their being the conserved root of an early network of tRNA-dependent amino acid synthesis 
pathways. Each pathway precursor, or branch point, in central metabolism is indicated. A 
discontinuous line indicates intermediates that lack an a-carboxyl for cofactor/adaptor tRNA 
attachment. Leu 8 and Arg 9 pathways contain a dicarboxyl intermediate, providing a site for tRNA 
exchange, resulting in anomalous adaptor and codon assignments. An over-bar shows type/sub¬ 
type of tRNA for each amino acid, using Saks and Sampson (1988) classification. Letter colors mark 
amino acid precursor - oa, oxaloacetate; og, oxoglutarate; and branch-point - py, pyruvate; pg, 3- 
phospho-glycerate; pep, phosphoenolpyruvate. 






























diversified from tRNA-IA Asn than tRNA-ID Gln . Although Glu 1 is precursor to Gin 2 , Pro 4 , and Arg 9 , a tRNA 
exchange at a triple carboxylated intermediate, arginino-succinate (step-8), is credited with Arg 9 acquiring an 
Asn family cofactor/adaptor, with a type-IA core structure, yielding isoacceptors cognate with codons AGR 
and CGN in the Asn 2 and Gin 2 code domains, respectively (Davis, 2013). With His 13 synthesis originating at 
D-ribose-5-phosphate, the amino acid did not plausibly acquire a Glu cofactor/adaptor until histidinol- 
phosphate synthesis, 2 steps prior to His 13 formation. 

Thus Gin 2 and Pro 4 are the only fully tRNA-dependent members of the Glutamate family. It is an open 
question, therefore, whether the 13 step His synthesis path provides a reasonable measure of its time-of- 
entry to the code. As a first generation addition to the code, the CAN box initially coded for Gin 2 in the NH 4 + 
Fixers Code. Acquisition of the CAY doublet by His 13 places its addition to the code in the final stage of its 
formation (Fig. 1). Early His 13 entry to the code could not, in view of this, be the source of its 1.67 depth 
index, as reported by Brooks et al. (2002). Although it is apparent, from Fig. 8a, that depth measurements 
convey information on amino acid time-of-entry into the code, Glutamate family results suggest it is 
responsive to other factors influencing pre- and post-LUCA protein residue frequencies. 

Figure 10 depicts the difference in cumulative number of conserved (non-universal) sites, and identical 
bases at these sites, in consensus base sequences of Aspartate and Glutamate family LUCA tRNA species 
(Davis, 2008). tRNA cognate for amino acids in the Asp family (Fig. 9) jointly conserved 13.4 ± 0.81 (m 
± s.e.m.) sites of tRNA-IA Asn 3U uG (Fig. 10a) - with universally conserved sites, anticodon, and 
cofactor/adaptor self-homology excluded. By contrast, only 4.9 ± 0.61 tRNA Gln 3 ' G uu sites were jointly 
conserved with Asp family tRNA. Most jointly conserved tRNA sites retained the same base: 10.4 ± 

1.01 in tRNA-IA Asn and 4.1 + 0.35 in tRNA Gln 3G uu (Fig. 10c). The mean number of jointly conserved sites 
and homologous bases arising between Glu family tRNA species and tRNA-IA Asn 3U uG, or tRNA Gln 3 ' GU u, 
were, respectively, 6.8 ± 2.25 and 4.9 + 0.61 for sites, and 3.6 + 0.75 and 2.7 + 0.67 for bases (Fig. 
10b,d). Sites and bases homology exhibited by Asp and Glu family tRNA with tRNA-IA Asn 3 U uG and 
tRNA-ID Gln 3GU u differed significantly: F si tes(df 3, 34) = 26.4, p = 5.28x1 O' 9 , and F ba ses(df 3, 34) = 17.6, 
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Fig. 10. Cumulative number of conserved (non-universal) sites and homologous bases in Aspartate 
and Glutamate family tRNA species matched with tRNA-IA Asn 3U uG, and tRNA-D Gln 3GU u- Single-letter 
abbreviations, on abscissa, give amino acid specificity of each tRNA; Asn and Gin tRNA (boxed) 
self-homology excluded. Two-letter abbreviations denote each amino acid precursor, or branch¬ 
point. Mean increments (with standard error) are shown. Site and base homology in Aspartate 
family tRNA, with tRNA-IA Asn 3 UUG , significantly exceed that of other distributions (see Table S3). 
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p = 4.54x1 CT 7 in an analysis of variance, Mean site and base homology were ranked in sequential 
difference tests as: (tRNA-IA Asn 3 U uGV. Asp family tRNA) > (tRNA-IA Asn 3U uG v. Glu family tRNA) 

* (tRNA Gln 3G uu v. Asp family tRNA) * (tRNA Gln 3GU u v. Glu family tRNA). 

Cofactor/adaptor tRNA of sixteen amino acids with aligned path-distance and depth code evolution 
metrics (Fig 8a), from four synthesis families with oxaloacetate, pyruvate, phosphoglycerate, or 
phosphoenolpyruvate as precursor, are revealed by the phylogenetic tree in Fig 9, and corroborated by 
cumulative sequence homology in Fig. 10, to have diversified from tRNA-IA Asn . 


10 Aspartate family and code duality 

Diversification of Aspartate family cofactor/adaptors from tRNA-!A Asn may be seen as evidence that the 
underlying family of amino acid synthesis pathways concurrently diversified from oxaloacetate, oa. Central 
metabolism intermediates upstream of each pathway branch-point retain an invariant, free carboxyl (Fig. 

11), allowing attachment of a tRNA cofactor and extension of each amino acid synthesis pathway back to oa 
- the primal precursor (Davis, 2013). Retention of free carboxyl groups by amino acid synthesis 
intermediates and, by extension, free invariants in central metabolism pathways, were revealed to be the 
conserved vestige of a cofactor, or scaffold, attachment site (Davis, 2013, 2015, 2018). Since central 
metabolism reactions predate translation, their role in bridging the gap between a primal precursor, notably, 
oa, and amino acid synthesis path branch-point, leaves the path-distance metric unaltered; as exemplified 
by the take-over of a Val 5 path segment in the synthesis of lie 7 (Davis, 1999). 

Arranging Aspartate family amino acids by synthesis branch-point and path-length (Table 2), reveals oa 
as direct precursor to seven amino acids. It contributed amino acids at each stage of code evolution, 
involving successive recruitment of codon 5’-, mid-, and 3’-site bases (Fig. 1). Beyond initial NH 4 + fixing 
reactions resulting in conversion of oa to Asp 1 , and its amide Asn 2 , and extensions of this pathway, the 
Aspartate family network of amino synthesis pathways is seen to have expanded, by utilizing pyr and 3-pg, 
as branch-points in the tRNA-dependent synthesis of six expansion-phase amino acids, with path-lengths of 



Table 2. Souce of Aspartate Family amino acid precursors at 
indicated stagesof codon base recruitment 


source 

RCC 

CT 

RPC 

CT 

precursor 1 

oa 

pyr 

3-pg 

pep 


COOH 

CHz 
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C=° 

CHs 

C=0 

HC-O-P 

HC-OH 

CHz 

II 

C-O-P 

invariant 

COOH 

COOH 

COOH 

COOH 

tRNA 
specificity 
and codon 
base stage 

Asp 1 

Asn 2 

5’- 



Thr 6 

lie 7 

Met 7 

Ala 2 

Val 5 

Leu 8 

Ser 4 

Cys 5 

Gly 5 mid- 


(Arg 9 ) 2 

Lys 10 



Phe 11 

Tyr 11 

Trp 14 3'- 


1 Arrows indicate direction of precursor recruitment, based on amino acid path-lengths. 

2 

Arg 9 acquired an Asn-related cofactor/adaptor in a tRNA exchange (Davis, 2013). 


2 to 8 steps. They entered the code during its expansion, when codon mid-site bases were recruited to 
encode incoming amino acids. In the final stage of code evolution, the 3’-base in weak-bonding, pre¬ 
assigned codons, was recruited to encode latecomer amino acids. Phosphoenolpyruvate, pep, was branch¬ 
point in the long pathways yielding three large aromatic amino acids, Phe 11 , Tyr 11 , and Trp 14 . Addition of 
Lys 10 , synthesized on a path originating at oa, and capture of Arg 9 , completed the Aspartate family. 

Two unequal subsets of amino acids have been identified within the standard set, based on the 
distribution of aligned and disaligned path-distance and phylogenetic-depth metrics (Fig. 8), cofactor/ 
adaptor tRNA sequence homology (Figs. 9, 10), and primal precursor to each network of tRNA-dependent 
synthesis pathways (Fig. 1, Table 2). At the base of this split in the standard set are NH 4 + fixer diacid/amide 
amino acid pairs, AspVAsn 2 and GluVGIn 2 . It may be noted that each pair shares aminoacyl-tRNA 














synthetases of the same class: aaRS-l charge Glu 1 and Gin 2 , and aaRS-ll charge Asp 1 and Asn 2 . While both 
classes of synthetase are the product of code-directed translation, with comparatively late-stage residue 
motifs (Davis, 2008), their distribution among the NH 4 + fixers provides compelling evidence for a duality 
among ribozymal synthetases at the origin of the genetic code. 

11. Discussion 

Occurrence of a direct correlation between amino acid path-distance and phylogenetic-depth validates the 
application of both metrics of code evolution (Fig. 8a) to reconstruction of its formation, during pre-LUCA 
evolution of template-directed protein synthesis. Restriction of the alignment to sixteen amino acids derived 
from oxaloacetate, pyruvate, phosphoglycerate, or phosphoenolpyruvate, with metric disalignment among 
oxoglutarate derived (Glutamate family, His 13 included) amino acids (Fig. 8b), additionally, furnished 
evidence of a deep, early duality within standard set amino acids. Phylogenetic analysis of cofactor/adaptor 
tRNA sequences (Fig. 9), after elimination of variations accrued (Eigen et al., 1989) in the long interval 
following species divergence, and their cumulative site and base homology with tRNA-IA Asn 3 ’uuG (Fig. 10), 
reveal tRNA, cognate for amino acids with aligned code metrics, diversified from the asparagine 
cofactor/adaptor. The continuity of advancement in code formation, apparent from the linear relation 
between path-distance and depth (Fig. 8a), suggests that oxaloacetate was the primal precursor of amino 
acid sub-families, with synthesis pathways branching from central metabolism at pyruvate, phospho¬ 
glycerate, or phosphoenolpyruvate (Table 2), forming a 16 member Aspartate family (Supplement, Fig. SI). 

NH 4 + fixers, Glu 1 and Gin 2 , have diminished frequency in pre-divergence residue sequences (Fig. 8b; 
Supplement, Table S2), implying late addition to the code (Brooks et al., 2002), notwithstanding, synthesis 
on 1 to 2 step paths (Fig. 1). This contributed to path-distance and depth being disalinged in Fig. 8b. Glu 1 
contributes substantially, as a donor in transamination reactions, to amino acid synthesis (Michal, 1999). 
Under-representation of Glu, and its amide, in proteins formed during code formation, could be a direct 
result of its utilization in transamnation. Standard set amino acids can be then seen as a product of both 
Aspartate and Glutamate families, to the extent that oxaloacetate served as a primal precursor and Glu 1 
participated in transamination for their synthesis. 



With a depth index, pi_ucA/Pmod, of 1.67 ± 0.17 (mid-range ± half-spread), His 13 exhibits the steepest rise in 
pre-divergence residue frequency, while paradoxically having a long, 13-step synthesis metric. An additional 
incongruency arises with it having the doublet, CAR, within a codon box also coding for an NH 4 + fixer 
earlycomer, Gin 2 . Its long synthesis path and split-box codons indicate that the elevated pre-LUCA 
incorporation of His 13 , apparent from its depth-index, does not result from early inclusion in the code. With a 
catalytic potential of 189 (Guttridge and Thornton, 2005; Davis, 2013), far in excess of other amino acids, its 
appearance, at a late stage of code formation, plausibly, initiated a surge in ‘acid-base’ enzyme formation. A 
consequent pre-LUCA rise in His 13 incorporation would account for its elevated depth index; phylogenetic 
depth directly assesses pre-LUCA amino acid incorporation levels and, while not strictly deterministic, they 
generally correlate with the time-order of amino acid addition to the code, based on the path-distance 
synthetic order metric (Fig. 8) and LUCA cofactor/adaptor tRNA homology (Fig. 9, 10). 

Codon bias detected in pre-divergence genes indicated assignment of triplet 4-set, UCN, to Ser 4 preceded 
allocation of doublet AGY (Diaz-Lazcoz et al., 1995). Conserved 4-set codons of Ser 4 also significantly 
exceeded conserved 2-set codons (conserved/non-conserved frequency index > 1.0) in the results reported 
for six species by Brooks and Fresco (2003, Table 4), with y 2 df=1 3.87, p = 4.91 xIO" 2 , notwithstanding an 
Aeropyrum pernis (an archaean) preference for 2-set Ser 4 codons: number of conserved/non-conserved 
codons 4-set, 14/10, and 2-set, 2/10. Present evidence reveals initial assignment of UCN triplets to Ser 4 
contributed to saturating the NCN column (Fig. 7), minimizing the risk of mutations to unassigned, 
untranslatable triplets, during expansion from the NH 4 + Fixers code, at NAN column (Sec. 6). Assignment of 
NCN column triplets prior to NGN triplets, conforms with selection for codon bonding strength during code 
evolution (Grosjean and Westhof, 2016); intact UCN box triplets have an estimated bonding enthalpy of 
-16 kcal/mol versus -14 kcal/mol for split-box AG Y R box triplets (Fig. 5a). 

With the first code located at the NAN triplet column, it had a mean codon bonding enthalpy estimated at 
-12.5 kcal/mol (Fig. 1). Back-tracking from its invariant mid-A indicates code formation was preceded by 
translation on a poly(A) template, with a ratchet-equipped ribosome and non-specific tRNA-ID adaptor. In 
this event, the initial stages of code formation were shielded from inter-codon competition, particularly since 
non-NAN triplets were untranslatable. Vertical (5’-site coding) intra-NAN column diacid/amide amino acid 



domains, in the NH 4 + Fixers code, reduced the risk of mutation to an untranslatable triplet from 3/4 (random 
array) to 1/3 (restriction to mid-base substitutions). After allowance for history and ‘lethal’-error avoidance (a 
result of starting small), selection for codon bonding strength represented a significant determinant of code 
formation (Grosjean and Westhof (2016). 

An analysis of codon-bias expanded, beyond ordering-in-time the doublet and quadruplet codon set of a 
single amino acid, to whole rows of the code indicated that recruitment of GNN codons by AspVGIu 1 , Ala 2 , 
Gly 5 , and Val 5 had preceded ANN codon acquisition by Asn 2 /Lys 10 , Thr 6 , Ser 4 /Arg 9 , Met 7 /lle 7 (Brooks and 
Fresco, 2003). Additional to the horizontal direction of time in code evolution, reflecting mid-base 
substitutions linked to amino acid synthetic order (Fig. 1), this development introduces a vertical component 
(5’: G-> A) of time in code formation. The former is most apparent among related amino acids in both rows: 
Ala 2 GC» -> Val 5 GU« (pyruvate derived) and Asn 2 AA» -> Thr 6 AC»-> lle 7 /Met 7 AU» (oxaloacetate 
derived). One of four GNN code boxes is split, AspVGIu 1 , and path-distance invariant. Three ANN boxes, 
however, are split, with Asn 2 /Lys 10 and Ser 4 /Arg 9 , and exhibit 5- to 8-step differences. This difference shifts 
the mean path-distance metric from 2.8 ± 0.92 steps for GNN row amino acids to 6.4 ± 1.04 steps in the 
ANN row, providing a source of the vertical time component in code evolution, as path-lengths of ANN row 
amino acids significantly exceed those of GNN amino acids: p= 1.63 x 10" 2 in a t-test. When long-path ANN 
amino acids, in split-boxes, are removed, however, the ANN mean amino acid path length, 4.8 ± 1.11, does 
not differ significantly from the GNN mean. p = 0.11: the vertical component of time in GNN versus ANN 
rows vanishes. The proposition that GNN row amino acids entered the code before those in the ANN row 
solely reflects overprinting of ANN codon boxes, in the final stage of code formation. It is notable that the 
horizontal time-metric is oriented in the same direction for both these rows: GNN, C-> U (Ala 2 -> Val 5 ), 
and ANN, A-> C-> U (Asn 2 -> Thr 6 -> Met 7 ). Both amino acid hydrophobicity and codon bonding strength 
(after NAN), likewise, share the horizontal orientation of synthesis path-length (Fig. 1). Mid-base recruitment 
during expansion from the NH 4 + Fixers Code is seen to have occurred in a compact, columnwise pattern, 
necessitated by the risk of an unreadable triplet stalling translation. 

Back-tracking from the NH 4 + Fixers Code has been noted to indicate the genetic code arose from 
translation on a poly(A) template involving a ratchet-equipped ribosome and tRNA-ID cofactor/adaptor, 



bearing either diacid amino acid, and resulting in random sequence polyanionic polypeptides (Davis, 2006). 
A polymeric source of N atoms, particularly following residue amidation, resulted, within a primal metabolic 
system at a ferrogeneous cationic surface (Wachtershauser, 1992; Dodd et al., 2017). Introduction of a 
UUU triplet, within the template strand, opened the way for synthesis of polypeptide chains of optimal length 
(Davis, 2019). The small first code, with codons restricted to 5’-specificity, a mid-A, and degenerate 3’-site 
(Fig. 1), required only four tRNA species, bearing a generic 3’-XUU anticodon, following the proposed 
transition from base self-recognition to complementarity within the proofing center (Davis, 2019), and 
retention of ambiguous Asp 1 and Glu 1 incorporation, specified by the GAN codon set. This simple code 
reduced the initial risk of mutation to an untranslatable triplet from % (random code) to !4 (mid-base change 
to NH 4 + Fixers Code). 

Code expansion retained this error-minimizing, compact pattern, by recruiting codons in a columnwise 
manner, through successive mid-base recruitment, C-> G-> U, in an order consistent with selection for 
codon bonding strength (Figs. 5, 6; Grosjean and Westhof, 2016). Synthetic order based on path-distance 
(Fig. 2), broadly equivalent to phylogenetic-depth (Fig. 8), determined the order of amino acid addition to the 
code. From the four polar amino acids and Ter signal of the NH 4 + Fixers Code, with 16 NAN set triplets (1 
code column), addition of ten increasingly hydrophobic amino acids during expansion (Fig. 2) resulted in a 
total of 60 assigned triplets (4 code columns, less codon box CGN). Pre-LUCA proteins furnished an 
attractor to drive incorporation of the increasingly hydrophobic amino acids during code expansion. 

A 23-residue antecedent of low potential ferredoxin, pro-Fd-[5] (Davis, 2002), reconstructed by N0rgaard 
(2009) and N0rgaard et al. (2009), with a code stage 5.6 ± 0.4 residue profile, combined a negatively 
charged 6-residue ‘foot’ (mean transfer free energy 4.9 ± 2.39 kcal/mol, values from Tolstrup et al. (1994)) 
with a non-polar, 17-residue [4Fe-4S] 2+/+ cofactor-binding segment (mean transfer energy, -0.83 ± 0.71 
kcal/mol, see Supplement, Fig S2). Its structure conforms with one of a protein adaptor, designed to anchor 
a bound cofactor to a cationic mineral surface. As the Fajan-Paneth principle stipulates, the hydrophobic 
cofactor binding segment would contribute to lengthening the attachment interval to a cationic mineral 
surface, by this small, ancient metallo-protein, enhancing its [4Fe-4S] 2+/+ cofactor participation in a 



primal metabolism, as formulated by Wachtershauser (1992). This mid-expansion phase protein amplifies 
the evident anchoring of N atoms by pre-code, random sequence, polyanionic poly(Asp\ Glu 1 ), suggesting 
that other encoded proteins, similarly, served as cofactor anchors, aiding cofactor participation in an early 
form of metabolism. Pre-LUCA H + ATPase proteolipid hi subunit has a code age of 7.0 ± 0.83, indicating 
membrane encapsulated cells had evolved by the final stages of code expansion. The highly hydrophobic 
residues of this transmembrane protein have a mean transfer free energy of -1.9 ± 0.25 kcal/mol. 

Six of eight code split-boxes occur within the hydropathy clusters centered on NAN and NUN triplets. At 
the completion of code expansion, triplets in these columns, respectively, encoded four polar NH 4 + fixers as 
first generation amino acids (Asp 1 , Glu 1 , Asn 2 , Gin 2 ), plus a Ter signal, and four non-polar amino acids 
acquired in the final stage of code expansion (Val 5 , lie 7 , Met 7 , Leu 8 ). Addition of six long-path amino acids, 
bearing basic and cyclic side chains followed. This entailed codon reassignment, usually as 3’- pyrimidine/ 
3’-purine doublets, in codon boxes with sub-threshold bonding enthalpy, exceeding -16 kcal/mol (Fig. 5a). 

All four codon boxes of the NAN triplet cluster were split by polar amino acids, resulting in (Asp 1 , Glu 1 ), 
(Asn 2 , Lys 10 ), (Gin 2 , His 13 ) and (Ter, Tyr 11 ) polar pairs. Both NUN triplet boxes were split by non-polar amino 
acids, yielding (Leu 8 , Phe 11 ) and (Met 7 , lie 7 ) non-polar pairs. Three of four post-expansion polar amino acids 
were thus acquired by polar NAN cluster and two of three non-polar amino acids were acquired by the non¬ 
polar NUN cluster to yield a homology distribution with a probability of random formation of 5.4 x 10" 3 (Fig. 
3b). Far higher probabilities arise when evaluating code structure as the product of a ‘frozen accident’ (Gilis 
et al. 2001; Goodarzi et al., 2004), From the present perspective, the code grew incrementally, beginning 
small, then expanding through an interplay of factors 
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Supplement 


1. Abbreviations 
Reaction cycles 

FRC, formose reaction cycle; RPC, reductive pentose-phosphate cycle; RCC, reductive citrate cycle. 

Central metabolism: precursors: oa, oxaloacetate; og, 2-oxoglutarate. branch points: py, pyruvate; pe, 
phosphoenol-pyruvate; gp, 2-phosphoglycerate; pg, 3-phosphoglycerate. 

Amino acids Standard three-letter and single-letter amino acid designations are used. Lower-case, double¬ 
letter abbreviations identify synthesis intermediates and their source (Figs. 1, SI). 

Alanine, Ala, A 

Arginine, Arg, ff;ng, N-acetyl-glutamate; np, N-acetyl-glutamate-phosphate; ns, N-acetyl-glutamate-y- 
semi-aldehyde; no, N-acetyl-ornithine; or, ornithine; cn, citrulline; rs, arginino-succinate. 

Asparagine, Asn, N 
Aspartate, Asp, D 
Chain termination, Ter, X 
Cysteine, Cyst C 
Glycine, Gly, G 
Glutamate, Glu, E 
Glutamine, Gin, Q 

Histidine, His, H: pp, phosphatidyl-ribosyl-pyro-phosphate; pt, phospho-ribosyl-adenosine-triphosphate; 
rm, phosphoribosyl-adenosine-mono-phosphate; ro, phospho-ribosyl-formimino-amino-imidazole- 
carboxamideribose-phosphate; ru, phospho-ribulosyl-formimino-amino-imidazole-carboxamide-ribose 
phosphate; ig, erythro-imidazole-gylcerol-phosphate; ia, imidazole-acetol-phosphate; hp, histidinol- 
phosphate; ho, histidinol. 

Isoluecine, lie, I: kb, a-keto-butyrate; ab, a-aceto-a-hydroxy-butyrate; dv, a,(B-dihydroxy-iso-valerate; 
Leucine, Leu, L: pm, a-isopropyl-malate; im, p-isopropyl-malate; ic, a-keto-isocaproate. 

Methionine, Met, M: sh, o-succinyl-homoserine; he, homocysteine. 

Lysine, Lys, AC/ dl, a,(3-dihydro-picolineate; pd, A1-piperdiene-2,6-dicarboxylate; sk, N-succinyl-e-keto-a- 
amino-pimelate; sa, N-succinyl- a,£-diamino-pimelate; dp, a,£-diamino-L-pimelate; mp, meso- a£-diamino- 
pimelate. 

Phenylalanine, Phe, F: pf, prephenate; fp, phenyl-pyruvate. 

Proline, Pro, P : gs, glutamate-y-semialdehyde; pc, A1-pyrroline-5'-carboxylate. 

Serine, Ser, S: pp, phospho-hydroxypyruvate; ps, phospho-serine. 

Threonine, Thr, T: ap, aspartyl-phosphate; as, aspartate-p-semialdehyde; hs, homoserine; ph, o-phospho- 
homo-serine. 

Tryptophan, Trp, W : ah, p-deoxy-arabino-heptulosonate-7-phosohate; dq, 5-dehydroquinate; ds, 5- 
dehydro-shikimate; sk, shikimate; kp, shikimate-5-phisohate; es, 3-enol-pyruvyl-shikimate-5-phosphate; ca, 
chorismate; aa, anthranilate; ra, N-phospho-ribosyl- anthranilate; cr, 1-(o-carboxyphenyl-amino)-1'- 
deoxyribulose-5-phosphate; ip, indole-3-glycerol-phosphate; in, indole. 

Tyrosine, Tyr, Y: hf, p-hydroxy-phenyl-pyruvate. 

Valine, Val, V : al, a-aceto-lactate; dv, a,(B-dihydroxy-iso-valerate; kv, a-keto-isovalerate. 

Nucleotides 

Adenosine, A; Cytosine, C; Guanosine, G; Uridine, U. Triplet coding site, X. 



2. Synthesis pathways in amino acid families 
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Fig. SI. Aspartate and Glutamate family tRNA-dependent amino acid synthesis pathways. Homology 
with tRNA-IA Asn and carboxylated intermediates upstream from each path branch-point supports 
oxaloacetate (oa) being primal precursor to four Asp family subfamilies. Small Glu family retained 
its primal precursor, oxoglutarate. An over-bar gives path-length from precursor, P, with pre-formed 
RCC and CT steps discounted. Two-letter abbreviations denote amino acid intermediates; red bar 
signifies a free a-carboxyl (tRNA attachment site), and blue triangle signifies an a-amine. Both 
families include a-amino acid intermediates, indicative of pathway end-product incorporation. 
Neighboring codons read by related tRNA, cognate for sibling amino acids, form a domain; 
quasidomains result on tRNA exchange. Letter and background colors identify precursors and 
branch-points. The number of path pre-LUCA enzymes is in brackets. Adapted from Davis (2013). 














































































































































3. Hydropathy of protein residues in code expansion phase 
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Fig. S2 (a). Residue hydropathy in ferredoxin antecedent pro-Fd-[5]. Pre-LUCA low potential 
ferredoxin antecedent, pro-Fd-[5], contains a negatively charged 6-residue (red), N-terminal foot, 
with mean hydropathy transfer free energy (AG T ) of 4.9 ± 2.39 kcal/mol (dashed line), connected to a 
17-residue C-terminal cofactor binding segment (green) (Davis, 2002; Norgraard, 2009, Norgaard et 
al., 2009). Four Fe atoms coordinately bond to sulfur atoms in cysteine residues (blue) to form a 
cubic [4Fe-4S] 2+/+ cluster. With a mean transfer free energy of -0.82 ± 0.71, the hydrophobicity of the 
long cofactor binding segment would prolong attachment, by Fajan-Paneth’s principle, of this small, 
ancient metallo-protein to a cationic mineral surface. Amino acid side-chain AG T values are from 
Tolstrup et al. (1994). (b) An 11 residue segment with a late code expansion stage profile identified 
in the H + -ATPase transmembrane proteolipid helix-1 (Davis, 2002). It has a mean transfer free energy 
of -1.9 + 0.25 kcal/mol, allowing it to partition with an early cell membrane. 














4. Hydropathy clusters within the genetic code 
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Fig. S3. Amino acid transfer free energy distribution within the standard code. Codon mid-base 
clusters of hydrophilic and hydrophobic amino acids within the standard code, in relation to 
synthesis path-length (superscripts). Among eight split boxes, five are between a short-path (2 to 7 
step paths) and long-path (9 to 14 step paths) amino acid, indicative of pre-assigned codon boxes 
overprinted by long path amino acids, in the final stage code evolution. Transfer free energy, AG T , 
values are from Tolstrup et al. (1994). 


































5. H Bond Enthalpy 


Table SI. Agreement between dimer pair and inferred H-bond enthalpies 


Category Dimer Enthalpy 1 Mean enthalpy Aggregate bond enthalpy 2 3 
pairs (kcal/mol) (m ± S.e.m.) - C-H-0 + C-H-0 


(kcal/mol) 


1. Weak 5’-AA 

UU 

5’-AU 

UA 

5’-UA 

AU 

2. Moderate 5’-CU 

GA 

5’-CA 

GU 

5’-GU 

CA 

5’-GA 

CU 

3. Strong 5’-CG 

GC 

5’-GG 

CC 

5’-GC 

CG 


-7.09 

-9.11 

-8.50 -8.23 ± 0.60 

-10.90 
-11.03 

-11.98 

-13.21 -11.78 10.53 

- 10.88 
-14.18 

-16.04 -13.70 1 1.51 


-8.20 -10.96 


-11.63 -12.66 


-14.36 -14.36 


1 From expanded pairs listed in Table 3 of Chen et al. (2012). 

2 Enthalpy values in this table were used to construct Fig 4 with FI-bond enthalpies. 

3 N-H-0 = -2.73, N-H-N = -1.72, and C-H-0 = 1.03 kcal/mol 






6. Time-order of amino acid addition to the genetic code 
Table S2. Amino acid path distance and phylogenetic depth values 1 


Amino 

acid 

Path 

length 2 3 

Depth 

PLUCA/Pmod 

Category 

Ala 

2 

1.428±0.070 

1 

Asn 

2 

1.183±0.113 


Asp 

1 

1.064±0.063 


Gly 

5 

1.070±0.050 


His 

13 

1.670±0.167 


lie 

7 

1.139±0.150 


Ser 

4 

1.124±0.112 


Thr 

6 

1.282±0.086 


Val 

5 

1.200±0.069 


Met 

7 

1.105±0.116 

2 

Arg 

9 

1.037±0.070 


Pro 

4 

0.984±0.058 


Cys 

5 

0.383±0.077 

3 

Gin 

2 

0.571 ±0.089 


Glu 

1 

0.914±0.054 


Leu 

8 

0.643±0.034 


Lys 

10 

0.799±0.049 


Phe 

11 

0.590±0.056 


Trp 

14 

0.354±0.059 


Tyr 

11 

0.464±0.047 



1 Path-length and depth values that form Fig. 8 correlation. 

Glu family amino acids (colored) were treated separately (see text) 

2 See Figs. 1 and SI. 

3 Pluca, Pmod are residue frequencies attributed to LUCA and modern 
proteins by Brooks et al. (2002). Depth values are mid-range ± half-range. 





7. Amino acid family affiliation of LUCA tRNA species 

Table S3. Distribution of conserved sites and base identity in LUCA tRNA of aspartate 
and glutamate amino acid family tRNA versus tRNA-IA Asn 3U uG and tRNA-ID Gln 3G uu 


Amino 


tRNA-IA^^uuG 


tRNA-ID Gln 3Guu 


acid 

No. 

sites 

Running No. Running 

total bases total 

No. 

sites 

Running No. 
total bases 

Running 

total 






Aspartate family 




D 

9 

9 

4 

4 

3 

3 

3 

3 

N 

- 

9 

- 

4 

10 

13 

6 

9 

T 

20 

29 

16 

20 

10 

23 

6 

15 

1 

12 

41 

10 

30 

3 

26 

2 

17 

M 

18 

59 

17 

47 

3 

29 

3 

20 

R 

(3-UCU) 

10 

69 

10 

57 

3 

32 

3 

23 

K 

16 

85 

16 

73 

4 

36 

4 

27 

A 

16 

101 

9 

82 

6 

42 

4 

31 

V 

11 

112 

7 

89 

4 

46 

4 

35 

L 

12 

124 

6 

95 

2 

48 

2 

37 

S 

(3-AGU) 

14 

138 

12 

107 

6 

54 

6 

43 

C 

8 

146 

7 

114 

5 

59 

5 

48 

G 

15 

161 

10 

124 

7 

66 

5 

53 

W 

10 

171 

8 

132 

4 

70 

4 

57 

F 

17 

188 

14 

146 

3 

73 

3 

60 

Y 

13 

201 

10 

156 

6 

79 

6 

66 

Total 

201 


156 


79 


66 


Mean 

13.4 


10.4 


4.9 


4.1 






Glutamate family 




E 

7 

7 

3 

3 

2 

2 

2 

2 

Q 

10 

17 

6 

9 

- 

2 

- 

2 

P 

6 

23 

3 

12 

4 

6 

4 

6 

H 

4 

27 

3 

15 

3 

9 

2 

8 

Total 

27 


15 


9 


8 


Mean 

6.8 


3.8 


3 


2.7 



dumber of jointly conserved (non-universal) tRNA sites in LUCA base sequences of cofactor/ 
adaptors for amino acids in Aspartate and Glutamate Families and number of these sites 
exhibiting base identity with pre-divergence asparagine and glutamine tRNA sequences. 
Values listed are derived from Davis (2008, Table 2). 









Supplement References 


Brooks DJ, Fresco J, Lesk AM, Singh M 2002. Evolution of amino acid frequencies in proteins over 
deep time: Inferred order of introduction into the genetic code. Mol. Biol. Evol. 19, 1645-1655. 

doi: 10.1093/oxfordiournals.molbev.a003988 

Chen JL, Dishler AL, Kennedy SD, Yildirim I, Liu B, Turner DFI, Serra MJ 2012. Testing the nearest 
neighbor model for canonical RNA base pairs: revision of GU parameters Biochemistry 51, 3508- 
3522. doi.orq/10.1021/bi3002709 

Davis BK 2002. Molecular evolution before the origin of species. Prog. Biophys. Mol. Biol. 79, 77-133. 

Davis BK 2008. Imprint of early tRNA diversification on the genetic code: Domains of contiguous 
codons read by related adaptors for sibling amino acids. In: Takeyama T. editor. Messenger RNA 
Research Perspectives. New York: Nova Science, pp. 35-79. 

Norgaard H 2009. Characterizion of ancient ferredoxins & Chlamydia Trachomatis ribonucleotide 
reductase. Ph.D. Thesis. Technical University of Denmark, Lyngby. 
https://orbit.dtu.dk/files/5O5Q569/Hanne%20N0rqaard.pdf 

Norgaard H, Helt SS, Ooi BL, Hagen WR, Christensen HEM 2009. Spectroscopic characterization of 
evolutionary old ferredoxins. J. Biol. Inorg. Chem. 14, Supp. 1,212, P581 
https://doi.ora/10.1007/s00775-009-0515-7 

Tolstrup N, Toftgard J, Engelbrecht J, Brunak S 1994. Neural network model of the genetic code is strongly 

correlated to the GES scale of amino acid transfer free energies. J. Mol. Biol. 243, 816-820. 






